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Abstract 


Background: The newly emerged Middle East respiratory syndrome coronavirus (MERS-CoV) that first appeared in 
Saudi Arabia during the summer of 2012 has to date (20th September 2013) caused 58 human deaths. MERS-CoV 
utilizes the dipeptidyl peptidase 4 (DPP4) host cell receptor, and analysis of the long-term interaction between virus 
and receptor provides key information on the evolutionary events that lead to the viral emergence. 


Findings: We show that bat DPP4 genes have been subject to significant adaptive evolution, suggestive of a 
long-term arms-race between bats and MERS related CoVs. In particular, we identify three positively selected 
residues in DPP4 that directly interact with the viral surface glycoprotein. 


Conclusions: Our study suggests that the evolutionary lineage leading to MERS-CoV may have circulated in 
bats for a substantial time period. 
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Main text 

Middle East respiratory syndrome coronavirus (MERS- 
CoV) [1], first described by the World Health Organization 
(WHO) on 23rd September 2012 [2,3], has to date (20th 
September 2013) caused 130 laboratory-confirmed hu- 
man infections with 58 deaths (http://www.who.int/csr/ 
don/2013_09_20/en/index.html). MERS-CoV belongs to 
lineage C of the genus Betacoronavirus in the family 
Coronaviridae, and is closely related to Tylonycteris bat 
coronavirus HKU4 (BtCoV-HKU4), Pipistrellus bat cor- 
onavirus HKU5 (Bt-HKU5) [4,5] and CoVs in Nycteris 
bats [6], suggestive of a bat-origin [6]. Unlike severe 
acute respiratory syndrome (SARS) CoV which uses the 
angiotensin-converting enzyme 2 (ACE2) receptor for cell 
entry [7], MERS-CoV employs the dipeptidyl peptidase 4 
receptor (DPP4; also known as CD26), and recent work 
has demonstrated that expression of both human and bat 
DPP4 in non-susceptible cells enabled viral entry [8]. 
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Cell-surface receptors such as DPP4 play a key role in 
facilitating viral invasion and tropism. As a consequence, 
the long-term co-evolutionary dynamics between hosts 
and viruses often leave evolutionary footprints in both 
receptor-encoding genes of hosts and the receptor-binding 
domains (RBDs) of viruses in the form of positively selected 
amino acid residues (i.e. adaptive evolution). For example, 
signatures of recurrent positive selection have been ob- 
served in ACE2 genes in bats [9], supporting the past 
circulation of SARS related CoVs in bats. To better under- 
stand the origins of MERS-CoV, as well as their potentially 
long-term (compared to short-term which lacks virus-host 
interaction) evolutionary dynamics with bat hosts [5,10], 
we studied the molecular evolution of DPP4 across the 
mammalian phylogeny. 

We first analyzed the selection pressures acting on bat 
DPP4 genes using the ratio of nonsynonymous (dy) to 
synonymous (ds) nucleotide substitutions per site (ratio 
dy/ds), with dy > ds indicative of adaptive evolution. The 
complete DPP4 mRNA sequence of the common pipistrelle 
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Table 1 Sequences used in the evolutionary analysis of DDP4 
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Common name Species name Family Accession no. 
Sheep Ovis aries Bovidae XM_004004660 
Killer whale Orcinus orca Delphinidae XM_004283621 
Cow Bos taurus Bovidae NM_174039 
Pig Sus scrota Suidae NM_214257 
Pacific walrus Odobenus rosmarus divergens Odobenidae XM_004410199 
Ferret Mustela putorius furo Mustelidae DQ266376 

Cat Felis catus Felidae NM_001009838 
Horse Equus caballus Equidae XM_001493999 
Rhinoceros Ceratotherium simum Rhinocerotidae XM_004428264 


Large flying fox 
Black flying fox 
Common vampire bat 
Brandt's bat 

David's myotis 
Little brown bat 
Common pipistrelle 
Guinea pig 

Degu 

Lesser Egyptian jerboa 
Mouse 

Rat 

Human 
Chimpanzee 
Pygmy chimpanzee 
Gorilla 

Orangutan 

Gibbon 

Olive baboon 
Rhesus monkey 
Galago 

Marmoset 


American pika 


Pteropus vampyrus 
Pteropus alecto 
Desmodus rotundus 
Myotis brandtii 
Myotis davidii 
Myotis lucifugus 
Pipistrellus pipistrellus 
Cavia porcellus 
Octodon degus 
Jaculus jaculus 

Mus musculus 
Rattus norvegicus 
Homo sapiens 

Pan troglodytes 

Pan paniscus 
Gorilla gorilla gorilla 
Pongo abelii 
Nomascus leucogenys 
Papio anubis 
Macaca mulatta 
Otolemur garnettii 
Callithrix jacchus 


Ochotona princeps 


Pteropodidae 
Pteropodidae 
Phyllostomidae 
Vespertilionidae 
Vespertilionidae 
Vespertilionidae 
Vespertilionidae 
Caviidae 
Octodontidae 
Dipodidae 
Muridae 
Muridae 
Hominidae 
Hominidae 
Hominidae 
Hominidae 
Hominidae 
Hylobatidae 
Cercopithecidae 
Cercopithecidae 
Galagidae 
Cebidae 


Ochotonidae 


ENSPVAG00000002634 
KBO3 1068 
GABZ01004546 
KE161360 
KB109552 
GL429772 
KC249974 
XM_003478564 
XM_004629976 
XM_004651712 
BCO22183 
NM_012789 
NM_001935 
GABE01002695 
XM_003820939 
XM_004032706 
NM_001132869 
XM_003266171 
XM_003907539 
JU474559 
XM_003795172 
XM_002749392 
XM_004577330 


(Pipistrellus pipistrellus) was downloaded from GenBank 
(www.ncbi.nlm.nih.gov/genbank/) along with that of the 
common vampire bat (Desmodus rotundus) from one 
transcriptome database (http://www.ncbi.nlm.nih.gov/ 
bioproject/178123). These sequences were then used to 
mine and extract DPP4 mRNA transcripts from a fur- 
ther five bat genomes (Table 1) using tBLASTn and 
GeneWise [11]. The complete DPP4 genes of bats and 
non-bat reference genomes from a range of mammalian 
species (Table 1) were aligned using MUSCLE [12] 
guided by translated amino acid sequences ( = 32; 727 


amino acids). We then compared a series of models within 
a maximum likelihood framework [13], incorporating the 
published mammalian species tree [14-16]. This analysis 
(the Free Ratio model) revealed that the d,/ds value on 
the bat lineage (0.96) was four times greater than the 
mammalian average (Figure 1). The higher dy/ds ratios 
leading to bats (Table 2) during mammalian evolution 
accord with the growing body of data [5,6,17,18] that the 
newly emerged MERS-CoV ultimately has a bat-origin. 
We next analysed the selection pressures at individual 
amino acid sites in bat DPP4. Using the Bayesian FUBAR 
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Common vampire bat 
Common pipistrelle 
Davids myotis 
Brandis bat 

Little brown bat 
Black flying fox 
Large flying fox 
Rhinoceros 
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American pika 
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eiseine7] }eq-uON 
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Rodents 


0.119(0.003,0.023) 


Primates 


Figure 1 Selection pressures on DPP4 during mammalian 
evolution. Ratios of nonsynonymous (dx) to synonymous (ds) 
nucleotide substitutions per site (dx/ds) are shown on four major 
ancestral branches; dy and ds numbers are also given in parentheses. 
Values for individual lineages are given in Table 2. DPP4 sequences of 
bat origin are shaded. 


method [19] in HyPhy package [20], we identified six 
codons that were assigned dy/ds > 1 with higher poster- 
ior probability (a strict cut-off of 95% in this analysis) 
(Table 3). To identify those sites under positive selection 
that may interact directly with MERS-CoV-like spike 
protein, bat DPP4 (from the common pipistrelle) was 
modelled against the structure of the human DPP4/ 
MERS-CoV spike complex [21] (Figure 2A). This revealed 
that three of the six positive selected residues (position 
187, 288 and 392) were located at the interface between 
bat DPP4 and MERS-CoV RBD (receptor binding do- 
main) (Figure 2). These residues therefore provide direct 
evidence of a long-term co-evolutionary history between 
viruses and their hosts. We also observed several variable 
regions (Figure 2B) within the bat RBD, that may also have 
resulted from virally-induced selection pressure and which 
merit additional investigation in a larger data set. 

Our analysis therefore suggests that the evolutionary 
lineage leading to current MERS-CoV co-evolved with 
bat hosts for an extended time period, eventually 
jumping species boundaries to infect humans and perhaps 
through an intermediate host. As such, the emergence of 
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Table 2 Numbers of nonsynonymous (dy) and synonymous 
(d,) substitutions per site DPP4 genes in different mammals 


Common name dn ds dn/ds 
Sheep 0.004 0.013 0.280 
Killer whale 0.023 0.039 0.595 
Cow 0.003 0.016 0.157 
Pig 0.027 0.109 0.246 
Pacific walrus 0.014 0.053 0.260 
Ferret 0.015 0.064 0.235 
Cat 0.021 0.081 0.258 
Horse 0.016 0.055 0.290 
Rhinoceros 0.017 0.044 0.385 
Large flying fox 0.005 0.001 3,561 
Black flying fox 0.004 0.008 0.487 
Common vampire bat 0.042 0.125 0.500 
Brandt's bat 0.006 0.012 0.463 
David's myotis 0.010 0.028 0.380 
Little brown bat 0.007 0.007 0.943 
Common pipistrelle 0.031 0.066 0.470 
Guinea pig 0.018 0.078 0.238 
Degu 0.016 0.128 0.122 
Lesser Egyptian jerboa 0.023 0.179 O.13:1 
Mouse 0.019 0.093 0.206 
Rat 0.027 0.110 0.248 
Human 0.001 0.007 0.086 
Chimpanzee 0.000 0.002 0.000 
Pygmy chimpanzee 0.001 0.000 ND 
Gorilla 0.003 0.004 0.863 
Orangutan 0.002 0.000 ND 
Gibbon 0.003 0.009 0.344 
Olive baboon 0.000 0.005 0.000 
Rhesus monkey 0.000 0.004 0.000 
Galago 0.022 0.149 0.149 
Marmoset 0.009 0.053 0.160 
American pika 0.036 0.229 0.156 


ND: Not determined because no synonymous substitutions are present. 


Table 3 Putatively positive selected DPP4 codons in bats 


Codon position” Posterior probability” dy/ds 
46 0.97 14.95 
oy 0.97 i313 
2 0.94 27 
187 0.95 G00 
288 0.98 13.90 
392 0.97 14.63 


“Codon position corresponding to the human DPP4 (NP_001926) protein sequence. 
Posterior probability of residues assigned a dy/ds ratio greater than 1. 
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Figure 2 Interaction of bat DPP4 and MERS-CoV spike protein receptor-binding domain and the location of positively selected sites. 
The structure was displayed using PyMol v1.6 (htto://www.pymol.org/). (A) Homology model showing the structural interactions between bat 
DPP4 (from common pipistrelle) coloured grey and MERS-CoV spike protein receptor-binding domain coloured blue. The three positively selected 
residues (positions 187, 288 and 392) located within the interface where the virus-host interact are highlighted as red. (B) Protein alignment of 
human DPP4 compared to that of seven bat species showing RBD spanning codons 41 — 400. Conserved and variable positions are shown in 
black and grey text, respectively, and residues under positive selection are coloured red. 
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